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Introduction 


In (9 ) Decell and Smiley and in [ 2 ] Decell and Qulreln have results 
that suggest the possibility of using a sequential monotone process for solving 
the feature selection problem (multivariate normal populations and best k 
linear combinations) using Householder transformations. The re&ults are 
general in that they apply to a large class of separability criteria (9). 

In this report these results will be applied to the divergence separability 
criterion and an expression for the gradient of the divergence (in the reduced 
feature space) with respect to the generator of a single Householder transformation 
will be developed. This expression for the gradient can be used in any number 
of differential correction schemes (iterators) that attempt to extremlze the 
divergence (in the reduced feature space). 

Two data sets provided by the Earth Observations Division-JSC are used 
to demonstrate selecting the Householder transformations that generate the 
k*n matrix defining the "best" (in the sense of extremlzlng the divergence) 
k linear combinations of features. The tests allow initial comparisons to 
be made with results obtained in [2]. In particular, this new technique 
does not appear to require 'initial guesses for the Iterator to be generated 
by the without replacement, exhaustive search, or other similar schemes. 

An Expression for the Gradient 

Using the results in [2] and [9] we need only calculate the gradient 


of the function 



where B • 
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; 1-1,... ,m are the class covariances 
^ij ' are the difference in class means 
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i. # 

First Taking differential of 1> H , it is easily verified dD B ■ P ♦ G 
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If we calculate the differential of X(U^U - 1) wl* 1 * . **pect to 

have 
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Clearly the differential of X(U T U - 1) with respect to X is 
so that if we define the matrix 
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It follows that 
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Routine to find Maximum Average Divergence 


I. Take the starting value - 


N 


^ N / 


Compute Initial B matrix B(U n ) - (I^/ZHl - 2 U q U o ) and the value of D b (U q ) 
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Uae a crude variation of the Sceepe&t Descent Method to extremlze D g . 
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Compute die B matrix with the new value of V and also the corresponding value 

of Dg. Repeat the procedure until ^ begins to stabilize. 

P 

II. The same procedure as in I except is replaced by and 

T 

S £ by where • (I - 2UU ), U Is the value cotaineddf max in I. 

III. The same procedure as In II except is replaced by H 2 H i V i H i H 2 

and replaced by 

IV. Continue, V continue... etc. until D_ does not Increase as a function 

B 

of Roman numeral steps. Note that the Iteration in each phase (i.e. I, II, III, 

1 IT 

etc) uses the same arbitrary Initial guess (— - — — ) . In addition, an 

yn in 

T 

attempt to satisfy the constraint II II • 1 is forced arbitrarily on the steepest 
decent procedure. This is a very crude scheme and potentially generates error. 
Moreover, the step size a Is taken to be constant In all phases and is obviously 

Inef f IclentA^Cye/^ «•«/ ^7 ***" d**™ 1 *^ 

The following test cases seem to Indicate relative Insensitivity to these 
crude Iteration adjustments. More sophisticated, careful computations are being 
Implemented to further refine the technique and eliminate these Inefficiencies. 

The technique will he available on the LARS terminal shortly. 
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Reaulta; Data Set I (210 F* ^int ) 

N ■ 12, m ■ 9, • - 6 , B la 6 by 12 matrix. 

Total Divergence D ■ 10660 


D B11 

3686 

D B21 

8221 

°B31 

0697 

D B12 

6639 

D B22 

92U6 

°B32 

9730 

d B13 

7769 

d B23 

9786 

D B33 

99*»0 

D BlU 

781*3 

D B 2 U 

99 U 

D B3U 

999«* 

d B15 

7605 

®B25 

9987 

d B35 

10018 

d bi 6 

6093 

D B26 

10020 

d B36 

10035 

D B17 

5825 

D B27 

10028 

D B37 

100U7 

Dbi 8 

7279 

d B28 

10032 

d B38 

10056 


Data Set II C UlLL CmxyvOt^) 

N * 16 , m ■ 5 , K ■ 6 
Total Divergence D » 636 
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298 


300 
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